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Abstract 

Transposable elements (TEs) are nnajor connponents of both prokaryotic and eukaryotic genonnes and play a significant role in their 
evolution. In this study, we have identified new prokaryotic DDE transposase families related to the eukaryotic Mutator-like transpo- 
sases. These genes were retrieved by cascade RSI-Blast using as initial query the transposase of the streptococcal integrative and 
conjugative element (ICE) TnG^52. By combining secondary structure predictions and protein sequence alignments, we predicted the 
DDE catalytic triad and the DNA-binding domain recognizing the terminal inverted repeats. Furthermore, we systematically charac- 
terized the organization and the insertion specificity of the TEs relying on these prokaryotic Mutator-like transposases (p-MULT) for 
their mobility. Strikingly, two distant TE families target their integration upstream qa dependent promoters. This allowed us to identify 
a transposase sequence signature associated with this unique insertion specif icity and to show that the dissymmetry between the two 
inverted repeats is responsible for the orientation of the insertion. Surprisingly, while DDE transposases are generally associated with 
small and simple transposons such as insertion sequences (ISs), p-MULT encoding TEs show an unprecedented diversity with several 
families of IS, transposons, and ICEs ranging in size from 1.1 to 52 kb. 
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Introduction 

Since their discovery by Barbara McClintock in the 1940s, 
transposable elements (TEs) have gradually attracted increas- 
ing interest. TEs were first thought to be potentially harmful 
parasitic entities and now are recognized as major contribu- 
tors to genome evolution. TEs have been found in nearly all 
sequenced organisms where they can represent an important 
proportion of the host genome. A recent analysis of the an- 
notation of 10 million protein-encoding genes in sequenced 
eukaryotic, archaeal, bacterial, and viral genomes and meta- 
genomes revealed that transposases are the most abundant 
and the most ubiquitous genes in nature (Aziz et al. 2010). 

Transposases, the enzymes catalyzing transposition of DNA 
segments, are classified in phylogenetically and structurally 



unrelated families, and DDE transposases represent one of 
the major classes. DDE transposases show similar catalytic 
domain architectures with a conserved triad of essential 
amino acids (Asp, Asp, and Glu) which coordinates a divalent 
metal ion (Haren et al. 1999; Curcio and Derbyshire 2003). 
This catalytic domain is associated with a DNA-binding region 
generally located in the N-terminal part of the protein respon- 
sible for the recognition of the terminal inverted repeats (IRs) 
at the TE extremities which correctly localizes the transposase 
for strand cleavage and transfer in the transposition reaction. 
DDE transposases are present in all three domains of life: 
eukaryotes, eubacteria, and archaea. Indeed, the integrases 
of retroviruses and that of the Escherichia coli bacteriophage 
mu represent extensively studied classes of DDE transposases 
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(Montano et al. 2012). Although DDE transposases are struc- 
turally linked, they show an overall low sequence conservation 
leading to numerous unannotated or misannotated represen- 
tatives in genonne databases. In eukaryotes, comparative anal- 
ysis of DDE transposases led to the identification of 19 
superfamilies (Jurka et al. 2005). A recent phylogenetic anal- 
ysis suggested that all eukaryotic cut-and-paste transposable 
element superfamilies have a common evolutionary origin and 
define three major phyla (Yuan and Wessler 2011). Among 
these groups, the highly mutagenic Mutator and Mutator-like 
elements (MULEs) represent a diverse family that is related to 
the prokaryotic \S256 family (Eisen et al. 1994). 

In prokaryotes, insertion sequences (ISs) are the simplest 
and the most abundant autonomous TEs. They were defined 
as TEs that only code the functions required for their mobility: 
a transposase gene surrounded by IRs that define the borders 
of the mobile DNA. IS have a dedicated repository database, 
ISfinder (www-is.biotoul.fr, last accessed January 27, 2014), 
that contains more than 4,000 carefully annotated ISs (Siguier 
et al. 2006). Some prokaryotic TEs harbor different "passen- 
ger genes," implicated in regulatory or accessory functions 
such as antibiotic resistance genes which confer a selective 
advantage to the host. However, the organization of these 
transposons may be even more complex. We have recently 
characterized a new family of TEs in streptococci, the TnGBS 
family, encoding a DDE transposase associated with different 
conjugative machineries that promote their horizontal transfer 
(Brochet et al. 2009; Guerillot et al. 201 3). This represents the 
first family of integrative and conjugative elements (ICEs) in 
which the phage-like integrase responsible for the excision 
and integration of the element is substituted by a DDE trans- 
posase. TnG^Ss were shown to transpose specifically 15- 
17 bp upstream different qa promoters (Brochet et al. 
2009). Similarity searches of the public databases revealed IS 
elements expressing transposases related to TnGBS transpo- 
sases (Brochet et al. 2009). These enzymes were not related to 
any known transposase. 

Here, by a cascade iterated Blast search we have ex- 
panded our vision of the diversity of p-MULT with the dis- 
covery of four new families, in addition to the IS256 family. 
TnGBS and related ISs transposases represent one of these 
new families. Similarity and secondary structure predictions 
allowed us to determine that these five families share an 
RNase H fold and to identify, unambiguously, the catalytic 
triad as well as the IR DNA-binding regions. By systematic 
analysis of the genomic context we showed that these 
transposases are responsible for the mobility of ISs both in 
eubacteria and in archaea but also of ICEs previously de- 
scribed in Mycoplasma (Dordet Frisoni et al. 2013). The 
identification of a new family of Mutator-like elements shar- 
ing the same insertion specificity as TnGBS upstream qa 
promoters provides further insights into this unusual prop- 
erty among prokaryotic transposases. 



Materials and Methods 

Cascade PSI-Blast Search of Transposases Related to 
Gbs1118 

The primary protein sequence of TnGBS2 transposase 
(Gbs1118) was used as an initial query in a PSI-Blast 
(Altschul et al. 1997) search against the NCBI nonredundant 
protein sequence database. Two rounds of PSI-Blast 
searches were performed without low complexity filter 
and with otherwise default parameters. Protein hits with 
an E-value above 0.005 and query coverage <60% were 
filtered out. Retained hits were then aligned using the 
MAFFT algorithm (Katoh and Standley 2013) with default 
parameters and a tree was built with Jalview using the av- 
erage distances calculated with the BLOSUM62 matrix 
(Waterhouse et al. 2009). Based on this tree, a protein hit 
distantly related to the query was chosen to perform a 
second PSI-Blast search. New protein hits obtained by this 
second round of PSI-Blast were retained and a new query 
for subsequent rounds of PSI-Blast search was selected by 
applying the same filtering, alignment, and tree building 
method. The systematic propagation of PSI-Blast searches 
through distantly related homologs allowed us to overcome 
the query dependence and asymmetry of the classical use of 
PSI-Blast (Bhadra et al. 2006). In total, we performed seven 
rounds of PSI-Blast search with the following queries: 
Gbsl 1 1 8 (NP_735564), Hore_071 30 (YP_002 508465), 
Krac_8686 (ZP_0696981 5), MAE_08640 (YP_00 1655878), 
MAGa5060 (YP_00351 5670), NAS141_01721 (ZP_ 
00964747), and Calow_0284 (YP_004001685). 

Transposase Family Clustering 

Protein hits retrieved by cascade PSI-Blast were first com- 
pared by all-against-all BlastP comparisons. Similarities with 
an E-value lower than 10"^ were retained to build a simi- 
larity network using the Cytoscape software (Cline et al. 
2007). We applied the force directed layout of Cytoscape 
to visualize the generated similarity graph where each node 
corresponds to a transposase homolog interconnected by 
edges representing the BlastP results. We applied a contin- 
uous mapping of the edge opacity to further weigh rela- 
tionships between transposase homologs (the more opaque 
edges correspond to lower BlastP E-values). Transposase ho- 
mologs were then clustered by using the Markov cluster 
algorithm (MCL) (http://micans.org/mcl/, last accessed 
January 27, 2014) implemented in the clusterMaker plugin 
of Cytoscape (Morris et al. 2011). We converted the edge 
weight with -log(E-value) and applied an inflation factor (IF) 
of 1 .2. This inflation value was chosen as it has been shown 
to be effective in clustering other well-defined IS families 
(Siguier et al. 2009). 
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Identification of Transposable Elements and of Their 
Insertion Sites 

The identification of TEs was performed semiautomatically by 
using scripts written in Python programming language and 
using the Biopython module (www.biopython.org/, last 
accessed January 27, 2014). First, the DNA coding sequences 
of all transposase homologs were retrieved together with 
400 bp of up- and downstream sequences. The extracted 
DNA sequences were then used as BlastN queries against 
the complete or draft genome sequences in which the TE is 
inserted. If the BlastN result gave more than three hits, the 
most likely TE boundaries were determined automatically 
based on the majority start and end of high-scoring seg- 
ment pairs (hsp). BlastN results giving multiple hsp that align 
with the first base of the query likely correspond to larger 
TEs encoding several open reading frames (ORFs) upstream 
the transposase gene. For these transposases, the length of 
the surrounding DNA sequence was extended until the ex- 
tremities of the TE were reached. For transposases present 
in less than three copies, the extracted DNA sequence was 
compared by BlastN with the genomic sequence of other 
isolates from the same species, if available. All TE boundaries 
were manually validated by the identification of IRs and 
direct repeats (DRs). All transposons and ISs identified in 
this study (supplementary table S1, Supplementary 
Material online) were submitted to the ISfinder database 
(Siguier et al. 2006). 

Insertion sites and insertion specif icity were analyzed upon 
extracting 300 bp sequences on both sides of the validated 
TEs after filtering identical insertions. These regions were 
scanned for putative qa promoters using the PPP software 
(http://bioinformatics.biol.rug.nlAA/ebsoftware/ppp, last 
accessed January 27, 2014). Hidden Markov models of the 
lactococcal qa dependent RNA polymerase binding site, al- 
lowing a 15-1 9 bp distance between the canonical -35 and 
-10 promoter elements, were constructed using alignments 
of known qa binding sites (Zomer et al. 2007). 

Phylogeny and Transposase Sequences Analysis 

For phylogenetic reconstruction, transposase sequences with 
BlastP similarity lower than 98% and representative of the 
diversity of TEs were retained. The transposase sequences 
were aligned using MAFFT version 6 with the E-INS-I 
method (Katoh and Standley 2013) and manually checked 
using Jalview (Waterhouse et al. 2009). Aligned positions 
with more than 60% of gaps were removed before construct- 
ing the tree. Phylogenetic relationships were inferred by 
Maximum likelihood (ML) using MEGA5 (Tamura et al. 
2011). Prior to ML analysis, the best protein substitution 
model of Jones-Taylor-Thornton (JTT) was selected according 
to the Akaike information criterion given by the ProtTest 
software (Darriba et al. 2011). Branch support was 



determined by 1 00 bootstrap replications. The level of conser- 
vation in protein sequence alignments was plotted using the 
plotcon application (http://emboss.sourceforge.net/, last 
accessed January 27, 2014). Secondary structure predictions 
were performed using the jpred3 server http://www.compbio. 
dundee.ac.uk/jpred (last accessed January 27, 2014) (Cole 
et al. 2008). 

Results 

Discovery of New Families of Mutator-Related 
Transposases 

Transposases from TnG^Ss and related ISs show a PFAM rve 
retroviral integrase domain with a low score (Brochet et al. 
2008). However, we did not retrieve known transposase se- 
quences by BlastP search at NCBI or at the ISfinder Web site. 
BlastP shows a low sensitivity. To retrieve more distantly re- 
lated protein sequences, we performed a cascade PSI-Blast 
search in the nonredundant protein database. After applying 
the filters described in the Materials and Methods section, we 
retained 731 protein hits. Interestingly, although 70% of 
these (517) are currently described as hypothetical proteins, 
1 04 were annotated as Mutator-like transposases, 23 as trans- 
posases of the IS256 family, 7 as \SH6 transposases, and 80 as 
transposases of unknown families. This iterative search sug- 
gests that, contrary to our first analysis, JnGBS transposases 
are distantly related to known transposase families. 

We then built a similarity graph to visualize the relatedness 
between protein sequences and to decipher the overall rela- 
tionships of putative and characterized transposases. In this 
network, protein sequences are represented as nodes that are 
connected by edges weighted according to their BlastP 
E-Value (fig. ^A). All hits form an interlinked network in agree- 
ment with the overall relatedness of all protein hits observed in 
the course of the PSI-Blast analysis. In particular, it shows the 
relatedness of JnGBS transposases with the Mutator transpo- 
sase superfamily and transposase of the previously identified 
IS256 and \SH6 families. By using the MCL, the protein se- 
quences of the similarity network were clustered in five 
groups, each defining a family of transposases that we 
named p-MULT 1-5 for prokaryotic Mutator-like transposase. 
Proteins previously identified as transposases of the IS256 and 
IS/-/6 families are members of two different clusters, p-MULT 1 
and p-MULT 2. JnGBS transposases are members of a large 
cluster encompassing 320 proteins (p-MULT 3) that are closely 
linked to two others clusters of 186 (p-MULT 4) and 31 
(p-MULT 5) proteins, respectively (fig. 1). Strickingly, except 
for the putative p-MULT 5 transposases that are only con- 
nected to TnG^5 transposases (p-MULT 3), the four other 
clusters are interconnected. The phylogenetic tree constructed 
with representatives of each family confirms this clustering 
(fig. 2). Our analysis extends a previous study reporting that 
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Fig. 1. — Similarity network and MCL clustering of prokaryotic Mutator-like transposases. (A) Similarity network of TnG55 transposases related protein 
sequences weighted according to all-against-all BlastP F-values. Each node represents a protein sequence obtained by the cascade RSI-Blast search. More 
opaque edges correspond to greater similarity according to the BlastP F-value. Each node was colored according to the MCL clustering of the network. (B) 
Similarity network of the five p-MULT families defined by MCL clustering of the all-against-all similarity graph of figure ^A. 



\SH6 transposases are distantly related to the \S256 family 
(Filee et al. 2007). The discovery of three additional putative 
transposase families shows that, together with \S256 and \SH6 
transposases, Mutator-like transposases in prokaryotes are 
much more diverse and widespread than previously thought. 

p-MULTs Are Encoded by Diverse Types of Mobile 
Elements: ISs, Transposons, and ICEs That Share 
Transposition Features 

To systematically identify the TEs that encode transposases of 
the five p-MULT families, we analyzed the DNA regions on 
both sides of the transposase genes for IRs and direct repeats 
(DRs). In total, we accurately identified 424 TEs in addition to 
the 58 TnG^5 related ICE previously described (supplementary 
table S1, Supplementary Material online). The 109 TEs encod- 
ing a p-MULT 1 transposase have the genetic organization of 
IS (fig. 3A). Based on BLAST analysis performed on the ISfinder 
database, they all belong to the IS256 family. These ISs are 
widely distributed among bacterial phyla (Proteobacteria, 
Firmicutes, Chlamydiae, Actinobacteria, and Deferribactere) 
and are also present in the archaeal phylum Euryarchaeota. 



Similarly, the 1 1 TEs that encode \SH6 related transposases (p- 
MULT 2) are ISs (fig. 3B). Six are new representatives of this 
small group. Interestingly, although the \SH6 group was first 
identified in archaea of the Euryarchaeota phylum (Filee et al. 
2007), we identified multiple copies of one IS belonging to this 
family in three different uncultured Desulfobacterium strains 
that are members of the proteobacterial phylum {\SDesp5, 
fig. 2). 

In the three other families, we identified more complex TEs. 
TEs encoding p-MULT 3 transposases include the TnG^S ICEs 
and 168 ISs that form a new IS family that we named \SLre2 
(fig. 30- Unlike TnGB5 that are restricted to the streptococcal 
genus, \SLre2 ISs were found in a broad variety of Firmicute 
species, and eight are present in multiple copies in a 
Fusobacterium and two Synergistetes strains, respectively, 
belonging to two distantly related phyla. 

The 11 5 TEs encoding p-MULT 4 transposases were found 
in numerous phyla: Proteobacteria, Cyanobacteria, Nitros- 
pirae, Bacteroidetes, Actinobacteria, Planctomycetes, and 
Chloroflexi. According to the transposase phylogeny, three 
distinct groups of ISs or simple transposons, that we named 
\SAzba1, \SMich2, and \SKra4, are clearly distinguishable. 
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Fig. 2. — Phylogenetic tree of prokaryotic Mutator-like transposases. Each p-MULT clade is colored according to figure 1. p-MULT 1 and p-MULT 2 
transposases are encoded by ISs of the \S256 and \SH6 families, respectively. p-MULT 3 transposases are encoded by both the JnGBS family and the \SLre2 
family. p-MULT 4 encoded by both transposons and by ISs form three different lineages: \SAzba1, \SMich2, and \SKra4. Transposons of the \SAzba1 group 
encoding a pRiA4_Orf3-like protein are indicated by blue dots. IS of the \SMich2 group with a predicted -1 frameshift in the transposase gene are indicated 
by pink dots. TE names are indicated at the extremity of the tree branches. TEs with a predicted ua promoter at a distance of 1 3-1 7 bp from the IR-genome 
junctions in more than 20% of their insertion sites (supplementary table S2, Supplementary Material online) are indicated by small black dots. 



Among the \SAzba1 group, 12 TEs forming a monophyletic 
branch harbor one to three ORFs in addition to the transpo- 
sase gene (figs. 2 and 3D). They share an ORF similar to one of 
unknown function in Agrobacterium rhizogenes plasmid 



pRiA4 (Endoh et al. 1990). Among these TEs, five carry a 
serine recombinase gene and one a tyrosine recombinase 
gene, respectively (supplementary table SI, Supplementary 
Material online). As shown for the Tn3 family, these site- 
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Fig. 3. — Diversity of transposable elements encoding transposases of the five p-MULT families. Representation of the gene maps and structural/ 
organizational diversity of the five TEs families: (A) p-MULT 1 , IS256 family; (B) p-MULT 2, ISH6 family; (0 p-MULT 3, TnG55/IS/.re2 family; (D) p-MULT 4, 
\SAzba1 , \SMich2, and \SKra4 families; (E) p-MULT 5 Mycoplasma ICE family. Arrows represent genes. Predicted functions of the gene products are 
indicated according to a color code shown at the bottom of the figure. Putative origins of replication are represented by yellow triangles. 



specific recombinases might be involved in the resolution of 
cointegrates generated by the transposition (Kostriken et al. 
1981). Sinnilarly, four \SKra4 elements encode different genes 
of unknown function in addition to their transposases (fig. 3D 
and supplementary table SI, Supplementary Material online). 
\SLdr1 from Legionella drancourtii LLAP12 encodes a group II 
intron putative reverse transcriptase (fig. 3D). Conversely, all 
\SMich2 TEs are IS. However, in this group, the transposases 
are encoded by two ORFs, and a -1 frameshift is sufficient to 
restore the expression of a full-length transposase. As this 
feature is conserved in the \SMich2 group, it likely reflects a 
mode of regulation of transposition by programmed transla- 
tional frameshifting as described in the IS/ or \S3 families 
(Chandler and Fayet 1993; Nagy and Chandler 2004). 

The 31 TEs encoding p-MULT 5 transposases were identi- 
fied in the genomes of Mycoplasma species. We precisely 
characterized the boundaries of 21 elements. Surprisingly, 
they are all of large size, ranging from 7 to 37 kb (fig. 3E). 
Eight of these are present in two or three copies. Except for 
the 7-kb-long element, they all encode homologs of type IV 
secretion systems proteins responsible for the mobility of 



conjugative elements (fig. 3E). Thirteen of these elements 
were previously predicted as ICE despite the absence of an 
identifiable integrase gene. The protein responsible for their 
integration was not known (Marenda et al. 2006). These ICEs 
share little sequence conservation and have diverse organiza- 
tions, as illustrated by ten representatives depicted in figure 
3E. Strikingly, the most unifying feature of these elements is 
the conservation of a p-MULT 5 putative transposase gene 
upstream of one of the two IR. This strongly suggests a role 
for this transposase in Mycoplasma ICE mobility. This predic- 
tion was recently experimentally demonstrated for ICEA of 
Mycoplasma agalactiae 5632 (Dordet Frisoni et al. 2013). 
ICEA is transferable by conjugation and its excision and inte- 
gration involve a p-MULT 5 transposase encoded by CDS22. 
These ICEs, like the JnGBSs, therefore rely on a DDE transpo- 
sase of the Mutator family and not a tyrosine or serine recom- 
binase for their mobility. Based on the identification of the 
putative transposase gene, we identified five new ICEs in 
three Mycoplasma hyopneumoniae strains and one in 
Mycoplasma capricolum subsp. capricolum strain ATCC 
27343. 
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Despite their genetic diversity, TEs encoding p-MULT trans- 
posases share several features. We identified 18- to 39-bp- 
long IRs at their extremities with a conserved terminal cytosine 
residue (supplementary fig. S^A, Supplementary Material 
online). Like eukaryotic MULEs, the insertion of these prokary- 
otic TEs generates DRs of 8 or 9 bp (supplementary table S1, 
Supplementary Material online). We have shown that JnGBSs 
transpose by production of an extrachromosomal circular 
form, which acts as a substrate for a plasmid-like replication 
and conjugative transfer. Similar circular forms of \SLre2 in 
Lactobacillus reuteri JCM 1112 have been detected by poly- 
merase chain reaction (data not shown). IS256 family ISs also 
transpose via a circular intermediate (Loessner et al. 2002) as 
do Mycoplasma ICEA and ICEF-I. In these circular forms, the 
terminal IRs are separated by 6-1 0 bp sequences derived from 
one of its two flanking DNA sequences (Calcutt et al. 2002; 
Marenda et al. 2006; Brochet et al. 2009; Dordet Frisoni et al. 
2013; Guerillot et al. 2013). These data suggest that the 
Mutator-like transposases of the five families catalyze trans- 
position using a similar mechanism involving the formation of 
a circular intermediate. 

Insertion Specificity for Upstream Promoter Regions Is 
Shared by the p-MULT 3 Family and One Lineage of p- 
MULT4 

We performed a systematic analysis of the insertion specificity 
of the five p-MULT families by extracting cognate genomic 
DNA sequences next to the IR-right and left of each TE 
(supplementary table S2, Supplementary Material online). In 
total, we obtained 2,833 IR-genomic DNA junctions. TnGBS 
and the related ISs are preferentially inserted 15-17 bases 
upstream the -35 region of qa promoters (Brochet et al. 
2009; Guerillot et al. 2013). To determine whether other p- 
MULT families show a similar insertion specificity, we first 
searched for putative ga promoter sequences on both sides 
of the TEs (supplementary table S2, Supplementary Material 
online). We then analyzed the position-specific enrichment of 
promoter detection relative to the end of the IRs. The result of 
this analysis is depicted in figure 4 for the insertion sites of TEs 
encoding p-MULT 1, 3, and 4. For the p-MULT 3 family, a 
strong relative increase of promoter detection is observed at 
a distance of 1 6 bp from IR-right (fig. 4/A), whereas no enrich- 
ment at any specific position was detected for TEs encoding 
p-MULT 1 transposases (fig. 40- This confirms the oriented 
insertion at a fixed distance from qa promoters catalyzed by 
p-MULT 3. More interestingly, we detected a similar but lower 
signal for TEs encoding p-MULT 4 transposases (fig. 4B). 
Therefore, some p-MULT 4 transposases share the p-MULT 
3 insertion specificity for upstream promoter regions (supple- 
mentary tables SI and S2, Supplementary Material online). 
However, a a promoters were predicted on both sides of the 
element. These 13 ISs {\SKra2, 4, 5 and 6; \SCasp2 and 3, 
\SLfe1 and 2; \SHhyl; \SAcce1; IS7\//1; \SUncu20; and 




-10 I 10 
p-MULT 3 




-10 I 10 
P-MULT 4 



C 

0.3 
0.2 
0.1 




-10 I 10 
p-MULT 1 



30 



Fig. 4. — Relative position of putative qa promoters identified in the 
DNA region surrounding TE insertion sites. The DNA sequences on both 
sides of insertions were extracted and scanned for putative qa promoters 
using the PPP software (Zomer et al. 2007). The histogram represents the 
ratio of the predicted qa promoters at a given position from the insertion 
site to the total number of ua promoters predicted at a maximum distance 
of 30 bp. Only the results obtained for p-MULT 3 (A), p-MULT 4 (B), and 
p-MULT 1 (0 for which a sufficient number of promoters were predicted 
are represented (4,207, 523, and 1,145 qa promoters, respectively). The 
abscise numbers correspond to the position of the predicted -35 sequence 
of qa promoters relative to the insertion site. Negative values correspond to 
the IRI-genome junctions and positive values to IRr-genome junctions. 



ISDesp4) are characterized by an almost perfect complemen- 
tarity (83-100%) between the right and left IRs. Likewise, 
\SLbu1 encoding a p-MULT 3 transposase of Leptotrichia buc- 
calis shows perfectly complementary IR of 24 bp and was the 
only TE of this family found inserted in both orientations with 
respect to qa promoters (supplementary table S2, Supplemen- 
tary Material online). These observations suggest that the ori- 
entation of TE insertions relies on a differential recognition of 
the two IRs by the transposase during integration. 

We did not observe any conserved DNA motif in the DNA 
flanking the insertions of TEs encoding p-MULT 1 or p-MULT 5 
transposases, in agreement with previous observations 
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showing that insertion of \S256 and M. agalactiae ICEA is likely 
random (Ziebuhr et al. 1999; Dordet Frisoni et al. 2013). For 
the TEs encoding p-MULT 2 transposases, we observed a con- 
servation of the flanking region among the 39 IS-genome 
junctions extracted. The consensus sequence of the DR 
corresponds to the AT-rich motif AANATNTT (supplementary 
fig. S1^, Supplementary Material online). Remarkably, we ob- 
served this sequence also at the insertion sites of the distantly 
related IS identified in the uncultured Desulfobacterium sp. 
{\SDesp5). The conservation of this particular targeting further 
supports the grouping of these bacterial iSs as new members 
of the ISH6 family. 

Conservation of a Mutator-Like Catalytic Domain 

The catalytic triad of DDE transposases consists of two aspartyl 
(D) residues and a glutamyl (E) residue, located in a conserved 
core that forms a characteristic RNase H-like fold of mixed 



oc-helices and p-strands (see supplementary fig. S2A, 
Supplementary Material online) (Hickman et al. 2010). The 
first D residue is located in pi, the second D residue is in or 
just after P4, and the third D/E residue in or just before a4. 
These three catalytic residues were experimentally confirmed 
in the IS256 transposase (Loessner et al. 2002). To identify the 
catalytic residues in the five p-MULT families, we combined 
sequence alignments and secondary structure predictions. 
First, we observed that the sequence of the transposases 
from the five p-MULT families align perfectly at the three cat- 
alytic residues of the IS256 transposase (fig. 5). Second, sec- 
ondary structure modeling of representatives of the five 
transposase families unraveled an RNase H-like fold with the 
expected positioning of the conserved DDE residues (supple- 
mentary fig. S2/\, Supplementary Material online), despite di- 
vergences in the regions between these putative catalytic 
residues. Compared with a typical RNase H fold, some DDE 
transposase catalytic domains are characterized by the 
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Fig. 5. — Alignment of the protein domains encompassing the catalytic DDE residues in p-MULT. Transposase sequences were aligned by the MAFFT 
alignment software (Katoh and Standley 2013) and visualized using Jalview (Waterhouse et al. 2009). The alignment was filtered for redundancy to 
subsequently retain a subset of transposases for each p-MULT family representative of their diversity. Only regions surrounding the predicted DDE residues 
and the C/D(2)H motif were kept in the alignment. Numbers given in parentheses correspond to the distance in aa residues between the different motifs. 
Transposases accession numbers are indicated on the left. 
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presence of a p-strand or a a-helical insert between the 
second D residue and the E residue (Hickman et al. 2010; 
Yuan and Wessler 2011). For the five p-MULT families, a 
99- to 138-aa-long a-helical insert was predicted between 
the catalytic residues D2 and E, like previously shown in eu- 
karyotic Mutator transposases (Hua-Van and Capy 2008; 
Yuan and Wessler 2011). Altogether these results allowed 
us to predict with a high confidence the three catalytic resi- 
dues in the transposases of the five p-MULT families (fig. 5). 

In addition to the predicted catalytic DDE residues, a spe- 
cific signature (C/D(2)H) is conserved in all retrieved homologs, 
1 1-19 aa downstream D2 (fig. 5). This motif is positioned in 
the a-helical insert located after the predicted strand-pS in the 
five p-MULT families (supplementary fig. S2/\, Supplementary 
Material online). Although the functional role of this motif is 
unknown, it has also been identified with a similar relative 
position in the predicted a-helical insert of eukaryotic 
Mutator-like transposases (Yuan and Wessler 201 1). 

Prediction of an N-Terminal DNA-Binding Domain 
Implicated in IR Recognition 

Two additional domains are conserved in the N-terminal part 
of the alignment of the p-MULT sequences (domain N1 and 
N2 in fig. 6 showing the similarity along the aligned se- 
quences). Domain N2, situated upstream the catalytic 



q p-- — ' — ' — ' — I — ' — ' — ■ — ' — r 
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Di D2 C/D(2)H E 

\ I \ I V ; 
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Fig. 6. — Level of conservation along the alignment of p-MULT se- 
quences. The graphical representation of the similarity scores along the 
aligned p-MULT sequences was plotted using the plotcon software of the 
EMBOSS package (www.ebi.ac.uk/rools/emboss/, last accessed January 
29, 2014). The similarity was calculated by moving a window of 15 aa 
residues along the aligned sequences. The similarity score at each position 
corresponds to the average of all the possible pairwise scores at that po- 
sition. The pairwise scores are taken from the BLOSUM62 matrix. The 
average of the similarity values at each position within the window was 
plotted. 



domain, corresponds to the DNA-binding region shown to 
recognize the terminal IRs of \S256 (fig. 7) (Hennig and 
Ziebuhr 2010). For the five transposase families, an identical 
secondary structure of two a helices of similar length sepa- 
rated by a possible turn formed by three residues were pre- 
dicted in this region, and several conserved residues are 
identically positioned in this structure (supplementary fig. 
SIB, Supplementary Material online). Therefore, this region 
probably represents the IR binding region for the five families. 

However, in 5 of the 31 putative transposases of 
Mycoplasma ICE, we also found a more divergent domain 
at the same position matching the HTH_23 PFAM domain 
(supplementary table SI, Supplementary Material online). 
This domain is present in transcription regulators but has 
also been identified in the DDE transposase of the \S630 
(equivalent to the mariner family) and \S30 families 
(Nagy et al. 2004). This observation suggests a replacement 
of the N2 domain in these five transposases. 

Insertion Specificity Upstream Promoters Is Associated 
with a Specific Transposase Sequence Signature 

p-MULT 4 transposases belong to three different phylogenetic 
lineages. We took advantage of the fact that all p-MULT 
4 transposases that catalyze integration of their cognate ele- 
ment upstream of putative qa promoters belong to the \SKra4 
group (fig. 2) to search for particular motifs associated with 
this insertion specificity. We compared the similarity of 
p-MULT 3 with p-MULT 4 transposases of the \SKra4 group 
or of the \SAzbal and \SMich2 groups (fig. 8) and identified a 
single region located between the conserved domains N1 and 
N2 which is differentially conserved. This region contains a 
conserved aspartyl residue (fig. 8). Comparison with transpo- 
sases from the other p-MULT families showed that this motif is 
conserved only between the TnGBS, \SLre2, and \SKra4 trans- 
posases that catalyze insertion upstream qa promoters. This 
strongly suggests that the motif is involved in insertion 
specificity. 

Discussion 

Mutator and Mutator-like transposases are one of the major 
superfamilies of transposases in eukaryotes (Yuan and Wessler 
201 1). They are encoded by diverse TEs present in most eu- 
karyotic lineages, including mammals, plant, fungi, and 
amoeba (Hua-Van and Capy 2008). The name Mutator orig- 
inates from the ability of active copies of this element to 
induce mutations corresponding to diverse recombination 
events, which were first described in plants (Bennetzen 
1984; Jiang et al. 2011) and later in other organisms 
(Amyotte et al. 2012). Among bacterial TEs, the IS256 
family was shown to be related to the eukaryotic Mutator- 
like elements (Mahillon and Chandler 1998; Hua-Van and 
Capy 2008). In this study, we further expanded the Mutator 
transposase superfamilies in prokaryotes by the discovery of 
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Fig. 7. — ^Alignment of the predicted N-terminal DNA binding domain implicated in p-MULT IR recognition. Transposases identified in this study were 
aligned by the MAFFT alignment software (Katoh and Standley 201 3) and visualized using Jalview (Waterhouse, et al. 2009). Only the predicted N2 domain 
encompassing the minimum IR-binding domain identified in the \S256 transposase (Hennig and Ziebuhr 201 0) was retained in the alignment. The alignment 
was filtered for redundancy in order to keep a subset of transposases representative of the diversity of p-MULT 1,2,3, and 4 families. Transposases accession 
numbers are indicated on the left. 



four additional families related to \S256, defining five p-MULT 
families (fig. 1). The majority of prokaryotic DDE transposases 
are associated with IS. Although the \S256 (p-MULT 1) and 
\SH6 (p-MULT 2) families contain only ISs, the three other 
Mutator-like families display a wide variety of different orga- 
nizations (fig. 3). TEs encoding p-MULT 3 transposases include 
ISs and the diverse streptococcal ICEs of the JnGBS family 
(Guerillot et al. 2013). More interestingly, all TEs encoding 
p-MULT 5 transposases are Mycoplasma ICEs previously 
described or identified in this study (Marenda et al. 2006). 

The only other former example of an association between a 
DDE transposase and a conjugation machinery is \CE6013 
(Han et al. 2009; Smyth and Robinson 2009). The combina- 
tion of transposition with conjugation implies recombination 
constraints linked to the physical separation of donor and re- 
cipient molecules. The expansion of ICE families relying on 
different IS related DDE transposases highlights that 



transposition via a circular intermediate overcomes these con- 
straints by generating a molecular substrate compatible with 
the conjugative transfer. As this mode of transposition is 
common to several widespread families of IS, such as \S1, 
\S3, \S2l and \S30 (Polard et al. 1992; Turlan and Chandler 
1995; Kallastu et al. 1998; Kiss and Olasz 1999; Berger and 
Haas 2001), the association of transposition with conjugative 
transfers of DNA might be underestimated. Alternatively, the 
transposition process catalyzed by Mutator-like transposases 
might be particularly adapted to conjugative transfer explain- 
ing why they are associated with two broad families of ICEs. 

TnG^Ss were shown to replicate both in the donor strain 
following circularization and in the recipient strain upon their 
insertion in the chromosome. This replication is dependent on 
a plasmid-like replicase and promotes the transfer of the ICE 
(Guerillot et al. 201 3). Mycoplasma ICEs show several features 
suggesting a transient replication. First, in iCEC(27343), 
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Fig. 8.— Transposase sequence signature associated with insertion specificity upstream qa promoters. Like in figure 6, the similarity score along the 
aligned putative transposase sequences was plotted from the MAFFT alignment using the plotcon software of the EMBOSS package. The similarity was 
calculated by moving a window of 1 0 aa residues along the aligned sequences and using the BLOSUM62 matrix. Two similarity plots were superimposed. The 
gray and black curves correspond to the similarity calculated on the alignment of p-MULT 3 plus \SKra4 transposases (p-MULT 4) and of p-MULT 3 plus 
\SAzbd1 and \SMich2 transposases (p-MULT 4), respectively. The regions showing a higher conservation with \SKra4 transposases than with \SAzba1 and 
\SMich2 transposases are colored in gray. The protein motif of the region specifically conserved between p-MULT 3 and p-MULT 4 transposases of the \SKra4 
group, and putatively involved in the targeting of ua promoters, was generated by using WebLogo (Crooks et al. 2004). 



ICEM(95oio), and ICEM(gmi2> a sequence of 70-286 nt located 
between the putative tranposase gene and the terminal IR 
shows more than 83% identity with the DNA region contain- 
ing the putative single strand origin of replication of the 
plasmid pMmc-95010 (Thiaucourt et al. 2011). Second, we 
identified in ICEC(27343), ICEM(gmi2> and ICEA-III(5632) a parA 
homolog (fig. 3E). ParA proteins are implicated in the seg- 
regation of replicating plasmids (Lutkenhaus 2012). 
Therefore, a transient replication might be a common feature 
of ICEs relying for their mobility on a Mutator-like DDE 
transposase. 

No three-dimensional structure of a Mutator-like transpo- 
sase is presently available. Nevertheless, secondary structure 
predictions showed that, as with other DDE transposases, 
the five p-MULT families show an RNase H fold organization 
(supplementary fig. S2, Supplementary Material online). This 
analysis also revealed features specifically shared between the 
eukaryotic Mutator-like transposases and their prokaryotic 
counterparts, such as a conserved C/D(2)H signature a few 
amino acids after the second aspartyl residue of the catalytic 
triad and a long a-helical insert between the second aspartyl 



and the glutamyl residues (fig. 5). Shared functional features 
further underscore the relationships between eukaryotic and 
prokaryotic Mutator-like transposases. Interestingly, as for 
IS256, JnGBS, and Mycoplasma ICE, circular forms have 
been observed in eukaryotic MULE, like Mul and Mu1.7 of 
the maize (Sundaresan and Freeling 1 987) and in the oc3 MULE 
of the yeast Kluyveromyces lactis (Barsoum et al. 201 0). Thus, 
this mode of transposition might be a unifying feature of the 
Mutator superfamily of transposases. 

The dramatic increase of genomic data provides opportu- 
nities to decipher the insertion specificity of transposable 
elements by comparing multiple insertion sites. We have char- 
acterized the diversity of insertion specificity among the five 
p-MULT families. The insertions of IS256 (p-MULT 1) and of 
Mycoplasma ICE (p-MULT 5) appeared to be random. We 
have previously shown that TnG^5 ICEs insert specifically up- 
stream qa promoters in a conserved orientation. We show 
here that this property is shared by both the \SLre2 family 
encoding p-MULT 3 transposases and several members of 
the \SKra4 group (p-MULT 4 family). By comparing transpo- 
sase sequences from these two lineages, we identified a 
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conserved motif predicted to be involved in this atypical inser- 
tion specificity among prokaryotic TEs (fig. 8). By analogy with 
the integrase of the yeast retrotransposon Ty3, which interacts 
with the transcription factors TFIIIB and TFIIIC (Kirchner et al. 
1995), we proposed that the JnGBS transposase interacts 
with a subunit of the RNA polymerase initiation complex 
(Brochet et al. 2009). The location of this motif just upstream 
the domain N2 interacting with the two IRs is compatible 
with such a model. It has been suggested that transposase- 
mediated circularization of \S256 preferentially starts with a 
sequence-specific first-strand cleavage at the left IS terminus 
(Hennig and Ziebuhr 2010). Similarly, the asymmetric trans- 
position of \S911 was shown to be a result of differential 
recognition of IRr and IRI by the transposase (Rousseau et al. 
2008). Based on the analysis of the orientation of insertion of 
the different TEs that target promoter regions, we propose 
that the asymmetric recognition of the two IRs is responsible 
for the specific orientation of most TEs encoding p-MULT 
3 transposases with the IRr next to the targeted promoter 
sequence. 

All the retrieved members of the \SH6 (p-MULT 2) family 
except one were identified in archaea. Promoters in archaea 
are more similar to eukaryotic Pol II dependent promoters with 
an AT-rich TATA box-like element (Palmer and Daniels 1995). 
Interestingly, \SH6 preferentially targets an AT-rich motif 
(AANATNTT) that is duplicated upon transposition (supple- 
mentary table S2 and fig. SI, Supplementary Material 
online). Thus, this insertion specificity might also lead to a 
preferential insertion of these ISs in promoter or intergenic 
regions. Interestingly, Pack-MULEs that are nonautonomous 
MULEs carrying fragments of cellular genes have been shown 
to preferentially insert into the 5^ end of genes (Jiang et al. 
2011). Therefore, targeting promoter regions and avoiding 
transposition into genes seems to be a shared strategy 
among Mutator-like elements to limit the fitness cost on the 
host cell. 

In conclusion, transposable elements encoding Mutator- 
like transposases are much more widespread and diverse in 
prokaryotes than previously thought. As in eukaryotes, they 
represent one major superfamily of transposable elements in 
prokaryotes. The late discovery of the expansion of this group 
was probably the result of the low protein sequence conser- 
vation that was only revealed by using an extensive cascade 
PSI-Blast search. The comparative analysis of these elements 
showed both unifying features in terms of the predicted struc- 
ture and transposition mechanism, but also differences in 
terms of insertion specificity and of organization. 

Supplementary Material 

Supplementary figures SI and S2 and tables SI and S2 are 
available at Genome Biology and Evolution online (http:// 
www.gbe.oxfordjournals.org/). 



Acknowledgments 

The authors thank Alexandre Almeida and Pierre-Emmanuel 
Douarre for critical reading of the manuscript and Stephane 
Descorps-Declere for his help in bioinformatics. They also 
thank Carmen Buchrieser, Christine Citti, Patrick Trieu-Cuot, 
Violette Da Cunha, and Isabelle Rosinski-Chupin for fruitful 
discussions. This work was supported by the French National 
Research Agency (grants 2010-PATH-004-02) and the LabEx 
project IBEID. 

Literature Cited 

Altschul SF, et al. 1997. Gapped BLAST and PSI-BLAST a new generation 
of protein database search programs. Nucleic Acids Res. 25: 
3389-3402. 

Amyotte SG, et al. 2012. Transposable elements in phytopathogenic 
Verticillium spp.: insights into genome evolution and inter- and in- 
tra-specific diversification. BMC Genomics 13:314. 

Aziz RK, Breitbart M, Edwards RA. 2010. Transposases are the most abun- 
dant, most ubiquitous genes in nature. Nucleic Acids Res. 38: 
4207^217. 

Barsoum E, Martinez P, Astrom SU. 2010. Alpha3, a transposable element 
that promotes host sexual reproduction. Genes Dev. 24:33^4. 

BennetzenJL. 1984. Transposable element Mul is found in multiple copies 
only in Robertson's Mutator maize lines. J Mol AppI Genet. 2: 
519-524. 

Berger B, Haas D. 2001. Transposase and cointegrase: specialized trans- 
position proteins of the bacterial insertion sequence IS2/ and related 
elements. Cell Mol Life Sd. 58:403^19. 

Bhadra R, et al. 2006. Cascade PSI-BLAST web server: a remote homology 
search tool for relating protein domains. Nucleic Acids Res. 34: 
W143-W146. 

Brochet M, Couve E, Glaser P, Guedon G, Payot S. 2008. Integrative 
conjugative elements and related elements are major contributors to 
the genome diversity of Streptococcus agalactiae. J Bacteriol. 190: 
6913-6917. 

Brochet M, et al. 2009. Atypical association of DDE transposition with 
conjugation specifies a new family of mobile elements. Mol 
Microbiol. 71:948-959. 

Calcutt MJ, Lewis MS, Wise KS. 2002. Molecular genetic analysis of ICEF, 
an integrative conjugal element that is present as a repetitive sequence 
in the chromosome of Mycoplasma fermentans PG 1 8. J Bacteriol. 1 84: 
6929-6941 . 

Chandler M, Fayet 0. 1993. Translational frameshifting in the control of 
transposition in bacteria. Mol Microbiol. 7:497-503. 

Cline MS, et al. 2007. Integration of biological networks and gene 
expression data using Cytoscape. Nat Protoc. 2:2366-2382. 

Cole C, Barber JD, Barton GJ. 2008. The Jpred 3 secondary structure 
prediction server. Nucleic Acids Res. 36:W197-W201 . 

Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WebLogo: a 
sequence logo generator. Genome Res. 14:1188-1190. 

Curcio MJ, Derbyshire KM. 2003. The outs and ins of transposition: from 
Mu to kangaroo. Nat Rev Mol Cell Biol. 4:865-877. 

Darriba D, Taboada GL, Doallo R, Posada D. 201 1 . ProtTest 3: fast selection 
of best-fit models of protein evolution. Bioinformatics 27:1 1 64-1 1 65. 

Dordet Frisoni E, et al. 201 3. ICEA of Mycoplasma agalactiae: a new family 
of self-transmissible integrative elements that confers conjugative 
properties to the recipient strain. Mol Microbiol. 89:1226-1239. 

Eisen JA, Benito Ml, Walbot V. 1994. Sequence similarity of putative trans- 
posases links the maize Mutator autonomous element and a group of 
bacterial insertion sequences. Nucleic Acids Res. 22:2634-2636. 



Genome Biol Evol 6(2):260-272. doi:10.1093/gbe/evu010 Advance Access publication January 13, 2014 



271 



Guerillot et al. 



GBE 



Endoh H, Hirayama T, Aoyama T, Oka A. 1990. Characterization of the 
virA gene of the agropine-type plasmid pRiA4 of Agrobacterium rhi- 
zogenes. FEBS Lett. 271:28-32. 

Filee J, Siguier P, Chandler M. 2007. Insertion sequence diversity in 
archaea. Microbiol Mol Biol Rev. 71:121-157. 

Guerillot R, Da Cunha V, Sauvage E, Bouchier C, Glaser P. 2013. Modular 
evolution of JnGBSs, a new family of ICEs associating IS transposition, 
plasmid replication and conjugation for their spreading. J Bacteriol. 
195:1979-1990. 

Han X, et al. 2009. Identification of a novel variant of staphylococcal 
cassette chromosome mec, type 11.5, and its truncated form by inser- 
tion of putative conjugative transposon Jn6012. Antimicrob Agents 
Chemother. 53:2616-2619. 

Haren L, Ton-Hoang B, Chandler M. 1999. Integrating DNA: transposases 
and retroviral integrases. Annu Rev Microbiol. 53:245-281. 

Hennig S, Ziebuhr W. 2010. Characterization of the transposase encoded 
by IS256, the prototype of a major family of bacterial insertion se- 
quence elements. J Bacteriol. 192:4153^163. 

Hickman AB, Chandler M, Dyda F. 2010. Integrating prokaryotes and 
eukaryotes: DNA transposases in light of structure. Crit Rev Biochem 
Mol Biol. 45:50-69. 

Hua-Van A, Capy P. 2008. Analysis of the DDE motif in the Mutator su- 
perfamily. J Mol Evol. 67:670-681 . 

Jiang N, Ferguson AA, Slotkin RK, Lisch D. 201 1 . Pack- Mutator-I ike trans- 
posable elements (Pack-MULEs) induce directional modification of 
genes through biased insertion and DNA acquisition. Proc Natl Acad 
Sci USA. 108:1537-1542. 

Jurka J, et al. 2005. Repbase Update, a database of eukaryotic repetitive 
elements. Cytogenet Genome Res. 1 10:462^67. 

Kallastu A, Horak R, Kivisaar M. 1998. Identification and characterization 
of IS/4/ / , a new insertion sequence which causes transcriptional ac- 
tivation of the phenol degradation genes in Pseudomonas putida. 
J Bacteriol. 180:5306-5312. 

Katoh K, Standley DM. 2013. MAFFT Multiple Sequence Alignment 
Software Version 7: improvements in performance and usability. 
Mol Biol Evol. 30:772-780. 

Kirchner J, Connolly CM, Sandmeyer SB. 1995. Requirement of RNA po- 
lymerase III transcription factors for in vitro position-specific integration 
of a retroviruslike element. Science 267:1488-1491. 

Kiss J, Olasz F. 1999. Formation and transposition of the covalently closed 
\S30 circle: the relation between tandem dimers and monomeric cir- 
cles. Mol Microbiol. 34:37-52. 

Kostriken R, Morita C, Heffron F. 1981. Transposon Jn3 encodes a site- 
specific recombination system: identification of essential sequences, 
genes, and actual site of recombination. Proc Natl Acad Sci USA. 78: 
4041-4045. 

Loessner I, Dietrich K, Dittrich D, Hacker J, Ziebuhr W. 2002. 
Transposase-dependent formation of circular IS256 derivatives in 
Staphylococcus epidermidis and Staphylococcus aureus. J Bacteriol. 
184:4709-4714. 

Lutkenhaus J. 2012. The ParA/MinD family puts things in their place. 

Trends Microbiol. 20:41 1-418. 
Mahillon J, Chandler M. 1998. Insertion sequences. Microbiol Mol Biol Rev. 

62:725-774. 



Marenda M, et al. 2006. A new integrative conjugative element occurs in 
Mycoplasma agalactiae as chromosomal and free circular forms. 
J Bacteriol. 188:4137-4141. 

Montano SP, Pigli YZ, Rice PA. 2012. The Mu transpososome struc- 
ture sheds light on DDE recombinase evolution. Nature 491 :41 3^1 7. 

Morris JH, et al. 201 1 . clusterMaker: a multi-algorithm clustering plugin for 
Cytoscape. BMC Bioinformatics 12:436. 

Nagy Z, Chandler M. 2004. Regulation of transposition in bacteria. Res 
Microbiol. 155: 387-398. 

Nagy Z, Szabo M, Chandler M, Olasz F. 2004. Analysis of the N-terminal 
DNA binding domain of the \S30 transposase. Mol Microbiol. 54: 
478^88. 

Palmer JR, Daniels CJ. 1995. In vivo definition of an archaeal promoter. 
J Bacteriol. 177:1844-1849. 

Polard P, Prere MF, Fayet 0, Chandler M. 1992. Transposase-induced ex- 
cision and circularization of the bacterial insertion sequence IS9//. 
EMBO J. 11:5079-5090. 

Rousseau P, Loot C, Turlan C, Nolivos S, Chandler M. 2008. Bias between 
the left and right inverted repeats during IS97/ targeted insertion. 
J Bacteriol. 190:6111-6118. 

Siguier P, Filee J, Chandler M. 2006. Insertion sequences in prokaryotic 
genomes. Curr Opin Microbiol. 9:526-531. 

Siguier P, Gagnevin L, Chandler M. 2009. The new IS /595 family, its re- 
lation to IS/ and the frontier between insertion sequences and trans- 
posons. Res Microbiol. 160:232-241. 

Smyth DS, Robinson DA. 2009. Integrative and sequence characteristics of 
a novel genetic element, \CE6013, in Staphylococcus aureus. 
J Bacteriol. 191:5964-5975. 

Sundaresan V, Freeling M. 1987. An extrachromosomal form of the Mu 
transposons of maize. Proc Natl Acad Sci USA. 84:4924-^928. 

Tamura K, et al. 201 1. MEGA5: molecular evolutionary genetics analysis 
using maximum likelihood, evolutionary distance, and maximum par- 
simony methods. Mol Biol Evol. 28:2731-2739. 

Thiaucourt F, et al. 2011. Mycoplasma mycoides, from "mycoides Small 
Colony" to "Capri". A microevolutionary perspective. BMC Genomics 
12:114. 

Turlan C, Chandler M. 1995. IS7-mediated intramolecular rearrange- 
ments: formation of excised transposon circles and replicative dele- 
tions. EMBO J. 14:5410-5421. 

Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ. 2009. 
Jalview Version 2 — a multiple sequence alignment editor and analysis 
workbench. Bioinformatics 25:1 189-1 191 . 

Yuan YW, Wessler SR. 201 1 . The catalytic domain of all eukaryotic cut- 
and-paste transposase superfamilies. Proc Natl Acad Sci USA. 108: 
7884-7889. 

Ziebuhr W, et al. 1 999. A novel mechanism of phase variation of virulence 
in Staphylococcus epidermidis: evidence for control of the polysaccha- 
ride intercellular adhesin synthesis by alternating insertion and excision 
of the insertion sequence element IS256. Mol Microbiol. 32:345-356. 

Zomer AL, Buist G, Larsen R, Kok J, Kuipers OP. 2007. Time-resolved de- 
termination of the CcpA regulon of Lactococcus lactis subsp. cremoris 
MG1363. J Bacteriol. 189:1366-1381. 

Associate editor: Tal Dagan 



272 Genome Biol. Evol. 6(2):260-272. doi:10.1093/gbe/evu010 Advance Access publication January 13, 2014 



